Picture for Dilek Hakkani-Tür

Dilek Hakkani-Tür

EJ

PIPA: A Unified Evaluation Protocol for Diagnosing Interactive Planning Agents

Add code
May 02, 2025
Viaarxiv icon

TD-EVAL: Revisiting Task-Oriented Dialogue Evaluation by Combining Turn-Level Precision with Dialogue-Level Comparisons

Add code
Apr 28, 2025
Viaarxiv icon

ToolRL: Reward is All Tool Learning Needs

Add code
Apr 16, 2025
Viaarxiv icon

YourBench: Easy Custom Evaluation Sets for Everyone

Add code
Apr 02, 2025
Viaarxiv icon

Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Add code
Mar 03, 2025
Viaarxiv icon

SMART: Self-Aware Agent for Tool Overuse Mitigation

Add code
Feb 17, 2025
Figure 1 for SMART: Self-Aware Agent for Tool Overuse Mitigation
Figure 2 for SMART: Self-Aware Agent for Tool Overuse Mitigation
Figure 3 for SMART: Self-Aware Agent for Tool Overuse Mitigation
Figure 4 for SMART: Self-Aware Agent for Tool Overuse Mitigation
Viaarxiv icon

Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model

Add code
Feb 12, 2025
Viaarxiv icon

Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis

Add code
Feb 06, 2025
Viaarxiv icon

Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems

Add code
Jan 31, 2025
Figure 1 for Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Figure 2 for Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Figure 3 for Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Figure 4 for Better Slow than Sorry: Introducing Positive Friction for Reliable Dialogue Systems
Viaarxiv icon

LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language

Add code
Jan 23, 2025
Viaarxiv icon